Estimating Frequency Moments of Data Streams Using Random Linear Combinations

نویسنده

  • Sumit Ganguly
چکیده

The problem of estimating the k frequency moment Fk for any nonnegative k, over a data stream by looking at the items exactly once as they arrive, was considered in a seminal paper by Alon, Matias and Szegedy [1, 2]. The space complexity of their algorithm is Õ(n1− 1 k ). For k > 2, their technique does not apply to data streams with arbitrary insertions and deletions. In this paper, we present an algorithm for estimating Fk for k > 2, over general update streams whose space complexity is Õ(n 1 k−1 ) and time complexity of processing each stream update is Õ(1). Recently, an algorithm for estimating Fk over general update streams with similar space complexity has been published by Coppersmith and Kumar [7]. Our technique is, (a) basically different from the technique used by [7], (b) is simpler and symmetric, and, (c) is significantly more efficient in terms of the time required to process a stream update (Õ(1) compared with Õ(n 1 k−1 )).

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Better Bounds for Frequency Moments in Random-Order Streams

Estimating frequency moments of data streams is a very well studied problem [1–3,9,12] and tight bounds are known on the amount of space that is necessary and sufficient when the stream is adversarially ordered. Recently, motivated by various practical considerations and applications in learning and statistics, there has been growing interest into studying streams that are randomly ordered [3,4...

متن کامل

A Very Efficient Scheme for Estimating Entropy of Data Streams Using Compressed Counting

Compressed Counting (CC) was recently proposed for approximating the αth frequency moments of data streams, for 0 < α ≤ 2. Under the relaxed strict-Turnstile model, CC dramatically improves the standard algorithm based on symmetric stable random projections, especially as α → 1. A direct application of CC is to estimate the entropy, which is an important summary statistic in Web/network measure...

متن کامل

Entropy Estimations Using Correlated Symmetric Stable Random Projections

Methods for efficiently estimating Shannon entropy of data streams have important applications in learning, data mining, and network anomaly detections (e.g., the DDoS attacks). For nonnegative data streams, the method of Compressed Counting (CC) [11, 13] based on maximally-skewed stable random projections can provide accurate estimates of the Shannon entropy using small storage. However, CC is...

متن کامل

Revisiting Frequency Moment Estimation in Random Order Streams

We revisit one of the classic problems in the data stream literature, namely, that of estimating the frequency moments Fp for 0 < p < 2 of an underlying n-dimensional vector presented as a sequence of additive updates in a stream. It is well-known that using p-stable distributions one can approximate any of these moments up to a multiplicative (1 + )-factor using O( −2 log n) bits of space, and...

متن کامل

Estimating Entropy of Data Streams Using Compressed Counting

The Shannon entropy is a widely used summary statistic, for example, network traffic measurement, anomaly detection, neural computations, spike trains, etc. This study focuses on estimating Shannon entropy of data streams. It is known that Shannon entropy can be approximated by Rényi entropy or Tsallis entropy, which are both functions of the αth frequency moments and approach Shannon entropy a...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2004